Web Page Classification using Anchor-related Text Extracted by a DOM-based Method

نویسندگان
چکیده

برای دانلود باید عضویت طلایی داشته باشید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Topical Web Crawling Using Weighted Anchor Text and Web Page Change Detection Techniques

In this paper, we discuss about the focused web crawler and relevance of anchor text as well as method for web page change detection for search engine. We have proposed a technique called weighted anchor text which uses the link structure to form the weighted directed graph of anchor texts. These weights are further used for deciding the relevance of the web pages as the indexing of these pages...

متن کامل

A DOM-based Anchor-Hop-T Method for Web Application Information Extraction

In order to implement the information fusion of electronic products, the widely adopted approach is to extract information from HTML structure of business Website with deeply data processing. However, modeling Web application is hard to be solved that the data in HTML is semi-formal which displayed as DOM (Document Object Model) tree when using XML schema to data analysis. How to understand and...

متن کامل

A Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification

In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...

متن کامل

Extracting Related Words from Anchor Text Clusters by Focusing on the Page Designer's Intention

Approaches for extracting related words (terms) by co-occurrence work poorly sometimes. Two words frequently co-occurring in the same documents are considered related. However, they may not relate at all because they would have no common meanings nor similar semantics. We address this problem by considering the page designer’s intention and propose a new model to extract related words. Our appr...

متن کامل

Automatic Web-Page Classification by Using Machine Learning Methods

This paper describes automatic Web-page classification by using machine learning methods. Recently, the importance of portal site services is increasing including the search engine function on World Wide Web. Especially, the portal site such as for Yahoo! service which hierarchically classifies Web-pages into many categories is becoming popular. However, the classification of Web-page into each...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Transactions of the Japanese Society for Artificial Intelligence

سال: 2010

ISSN: 1346-0714,1346-8030

DOI: 10.1527/tjsai.25.37